NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

You Only Debias Once: Towards Flexible Accuracy-Fairness Trade-offs at Inference Time

Han, Xiaotian; Chen, Tianlong; Zhou, Kaixiong; Jiang, Zhimeng; Wang, Zhengyang; Hu, Xia (February 2025, The Second Conference on Parsimony and Learning)

Free, publicly-accessible full text available February 11, 2026
BlendFilter: Advancing Retrieval-Augmented Large Language Models via Query Generation Blending and Knowledge Filtering

Wang, Haoyu; Li, Ruirui; Jiang, Haoming; Tian, Jinjin; Wang, Zhengyang; Luo, Chen; Tang, Xianfeng; Cheng, Monica Xiao; Zhao, Tuo; Gao, Jing (November 2024, Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing)

Free, publicly-accessible full text available November 12, 2025
LightToken: a Task and Model-agnostic Lightweight Token Embedding Framework for Pre-trained Language Models

Wang, Haoyu; Li, Ruirui; Jiang, Haoming; Wang, Zhengyang; Tang, Xianfeng; Bi, Bin; Cheng, Monica; Yin, Bing; Wang, Yaqing; Zhao, Tuo; et al (August 2024, KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

Full Text Available
LightLT: A Lightweight Representation Quantization Framework for Long-Tail Data

https://doi.org/10.1109/ICDE60146.2024.00114

Wang, Haoyu; Li, Ruirui; Wang, Zhengyang; Tang, Xianfeng; Zhang, Danqing; Cheng, Monica; Yin, Bing; Droppo, Jasha; Wang, Suhang; Gao, Jing (May 2024, Proceedings of IEEE International Conference on Data Engineering (ICDE))

Full Text Available
Towards Unified Multi-Modal Personalization: Large Vision-Language Models for Generative Recommendation and Beyond

Wei, Tianxin; Jin, Bowen; Li, Ruirui; Zeng, Hansi; Wang, Zhengyang; Sun, Jianhui; Yin, Qingyu; Lu, Hanqing; Wang, Suhang; He, Jingrui; et al (May 2024, ICLR)

Full Text Available
Data Diversity Matters for Robust Instruction Tuning

https://doi.org/10.18653/v1/2024.findings-emnlp.195

Bukharin, Alexander; Li, Shiyang; Wang, Zhengyang; Yang, Jingfeng; Yin, Bing; Li, Xian; Zhang, Chao; Zhao, Tuo; Jiang, Haoming (January 2024, Association for Computational Linguistics)

Full Text Available
A Unified Framework of Graph Information Bottleneck for Robustness and Membership Privacy

https://doi.org/10.1145/3580305.3599248

Dai, Enyan; Cui, Limeng; Wang, Zhengyang; Tang, Xianfeng; Wang, Yinghan; Cheng, Monica; Yin, Bing; Wang, Suhang (August 2023, In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2023))

Full Text Available
Tab-Cleaner: Weakly Supervised Tabular Data Cleaning via Pre-training for E-commerce Catalog

https://doi.org/10.18653/v1/2023.acl-industry.18

Cheng, Kewei; Li, Xian; Wang, Zhengyang; Zhang, Chenwei; Huang, Binxuan; Xu, Yifan Ethan; Dong, Xin Luna; Sun, Yizhou (July 2023, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics)

Product catalogs, conceptually in the form of text-rich tables, are self-reported by individual retailers and thus inevitably contain noisy facts. Verifying such textual attributes in product catalogs is essential to improve their reliability. However, popular methods for processing free-text content, such as pre-trained language models, are not particularly effective on structured tabular data since they are typically trained on free-form natural language texts. In this paper, we present Tab-Cleaner, a model designed to handle error detection over text-rich tabular data following a pre-training / fine-tuning paradigm. We train Tab-Cleaner on a real-world Amazon Product Catalog table w.r.t millions of products and show improvements over state-of-the-art methods by 16% on PR AUC over attribute applicability classification task and by 11% on PR AUC over attribute value validation task.
more » « less
Full Text Available
LightToken: A Task and Model-agnostic Lightweight Token Embedding Framework for Pre-trained Language Models

Wang, Haoyu; Li, Ruirui; Jiang, Haoming; Wang, Zhengyang; Tang, Xianfeng; Bi, Bin; Cheng, Monica; Yin, Bing; Wang, Yaqing; Zhao, Tuo; et al (August 2023, KDD '23: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining)

Full Text Available
Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs

https://doi.org/10.18653/v1/2023.findings-acl.642

Huang, Zijie; Wang, Daheng; Huang, Binxuan; Zhang, Chenwei; Shang, Jingbo; Liang, Yan; Wang, Zhengyang; Li, Xian; Faloutsos, Christos; Sun, Yizhou; et al (July 2023, Findings of the Association for Computational Linguistics: ACL 2023)

Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts’ granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts’ granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box.
more » « less
Full Text Available

« Prev Next »

Search for: All records